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Abstract 

We report on simulations with two flavors of 0(a) improved degener- 
ate Wilson fermions with Schrodinger functional boundary conditions. The 
algorithm which is used is Hybrid Monte Carlo with two pseudo-fermion 
fields as proposed by M. Hasenbusch. We investigate the numerical preci- 
sion and sensitivity to reversibility violations of this algorithm. A gain of a 
factor two in CPU cost is reached compared with one pseudo-fermion field 
due to the larger possible step-size. 
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1 Introduction 

Algorithmic tests with dynamical fermions are large scale projects by themselves. 
It is therefore a good idea to integrate them in large scale physics projects, part of 
whose runs are dedicated to study variants of the algorithm or whose configura- 
tions are stored and used to further analyze the algorithm. In this work we present 
a study of a variant of the HMC algorithm which uses two pseudo-fermion fields 
per degenerate fiavor doublet, as proposed by M. Hasenbusch [1,2] and recently 
tested in [3] . Our simulations of full QCD are performed within the Schrodinger 
functional framework and deal with two fiavors of massless quarks. The Wilson 
plaquette action is taken for the gauge field and the 0(a) improved Wilson-Dirac 
operator for the fermions. 

The algorithmic study presented here is integrated in the ALPHA project for 
the computation of the running of the renormalized quark mass [4] . Since we use 
a finite size technique our lattices range from a very small physical size of 10~^ fm 
to an intermediate size of 1 fm. For these lattice volumes we can study the scaling 
behavior of the HMC algorithm since we can simulate different lattice spacings at 
fixed physical lattice size. 

In Section 2 we describe the main steps to build the HMC algorithm with 
two pseudo-fermion fields. We explain our choice of the factorization of the Dirac 
operator into two parts. Section 3 is dedicated to the study of the required 
numerical precision and to the reversibility of the integration of the molecular 
dynamics equations of motion. A comparison with HMC using one pseudo-fermion 
field is done for several physical lattice sizes and shows that we gain about a factor 
two in performance when using two pseudo-fermion fields. 

2 HMC with two pseudo-fermion fields 

We would hke to simulate on the lattice two fiavors of Wilson quarks in the frame- 
work of the Schrodinger functional with 0(a) improvement [5]. The Hamiltonian 
for the evolution in fictitious time is defined as 

//=i5^Tr[7r(x,/x)2] + 5eff, (2.1) 

where 7t{x,ij,) are the traceless Hermitian momenta conjugate to the gauge field 
link variables U (x, jj.) and the effective action (at first for one pseudo-fermion field) 
is given by 

Ses = S^{U) + SaetiU) + Spf{U, 0, 0t) . (2.2) 
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In eq. (2.2) Sg{U) is the Wilson plaquette gauge action with gauge coupling = 
6/P and the other two terms represent the fermionic contributions. The Wilson- 
Dirac operator is 

a{D^ + 5D + mQ) ^ , k = (8 + 2amo)"\ (2.3) 

where 5D is the correction for 0(a) improvement (clover term). We use even-odd 
preconditioning [6] and the Wilson-Dirac operator assumes the block form 

M-C , ^" V (2.4) 



1 + T, 



oo 



where the subscripts e (o) refer to the even (odd) sites of the lattice. The boundary 
values of the quark fields are set to zero and in this case the 0(a) improvement 
terms proportional to {ct — 1) do not contribute to the off-diagonal blocks [7]. 
The operators T^e and Too are diagonal in space-time and represent the 0(a) 
corrections. The Hermitian operator 75M can be written as the product 

/ 75(1 + Tee) \ / 1 (1 + Tee)-Weo \ ^) 

V 75M,e 1 J V 75{l + T,„-Moe(l + Tee)-lMeo} J ' ^ ' ^ 

We define the following Hermitian operators 

Q = co75M, co^{1 + 8k)-\ (2.6) 

Q = Co75{l + Too - Moe{l + T,e)-'M,o} , Co = (1 + 64«;2)-i . (2.7) 

The fermionic contribution to the partition function can then be written as 

det(g^) oc det(l + Teef dct . (2.8) 

Prom eq. (2.8) we get the contributions to Sgs in eq. (2.2) 

>Sdet = -2trln(l+Tee) (2.9) 

det(Q2) oc y D[0] exp(-5pF) , Spp = (p^Q-^cp , (2.10) 

in terms of one dynamical complex pseudo-fermion field (j). 

In [1,2] a modified effective pseudo-fermionic action has been proposed. By 
using the identity Q = QQ~^Q for some arbitrary invertible matrix Q, we can 
compute det((5^) by using two instead of one pseudo-fermion fields 

„ /«pf \ «pf 
det(g^) oc / exp(-^5Fj, npf = 2 (2.11) 




i=l 

Sf, = <Pl{QQ^)-'h (2.12) 
-^F^ = 4iQ-'QQ^Q-')(l>2. (2.13) 
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In this work we choose 

Q = Q-ip, (2.14) 
where p is a real number. We can then cast eq. (2.12) and eq. (2.13) into the form 

S^,^4>\[al + {Q'' + pD-']4>i, i = l,2, (2.15) 
where the field 02 has been rescaled and 

(71 = 0, pi=p, (2.16) 

a2 = -, P2 = 0. (2.17) 
P 

In the Hamiltonian eq. (2.1) the effective action is now given by 

2 

^PF = $^^F,(^,0.,0l). (2.18) 
i=l 

We determine the parameter p in eq. (2.14) by requiring that the sum K of 
the condition numbers of the operators appearing in the actions S^^ and S-p^ is 
minimal. If we use the notation Amin = ^rainiO^) i Amax = Ainax(Q^) for the lowest 
respectively highest eigenvalue of we get 

K . + .(p)^^. (2.19) 

where 

A; = ^ (2.20) 

^min 

is the condition number of Q^. The minimal K is reached for y{p) — y/k. The 
condition numbers of the operators in Sp-^ and Sp^ are then both equal to Vk. 
Solving for p we obtain p — (AminAmax)^^^- We set^ 

P= ((A„.in)(A„,ax))'/' , (2.21) 

where we denote by (■) the expectation value computed in a Monte Carlo simula- 
tion. Of practical advantage is the following relation which holds at tree level in 
the Schrodinger functional [8] 

k oc (-^ , (2.22) 



^In practice Amax fluctuates negligibly compared to An 
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where T is the temporal extension of the box. Quahtatively the same behavior 
(at the critical mass) is expected for 

in the continuum limit at fixed renormalized coupling g'^{L) = u, where L is the 
spatial extension of the box. This means that p as defined in eq. (2.21) should be 
scaled with the lattice size T/a approximately like ^J'oJT . 

The algorithm, which we use in our simulations of eq. (2.1) and eq. (2.18) is 
Hybrid Monte Carlo [9, 10] and consists of the following 4 steps: 

1. The momenta n{x,ii) are generated from the Gaussian distribution of their 
components in a basis of the Lie Algebra of SU(3). 

2. The two pseudo-fermion fields 0i and 02 are generated by global heatbath 
according to the probability distribution 

P(0,) « exp [-4 [a^ + (g2 + ^ exp{-i?ti?} . (2.24) 

This is done as usual by generating a complex Gaussian random vector R. For 
ai ^ we use 



(Q' + + af )^, = ar\Q - p\ + af){Q - ip,)R. (2.25) 
If (Tj = we use 

^i^{Q-ipdR- (2.26) 

3. The gauge links U (x, p) and the momenta 7r(a;, p) arc evolved along a trajectory 
of length r by integrating the molecular dynamics equations of motion with step- 
size 5t. This can be done in full analogy with [11]. In the equations of motion for 
the momenta we need the variation 5S-Pi of the pseudo-fermion actions under an 
infinitesimal change of the gauge link 5U{x, p). With the help of the vectors 

Xi = (Q' + pD"V., (2.27) 
Yi = {Q + ip,)X,, (2.28) 

defined on the odd sites of the lattice we construct over the full lattice the vectors 
X.= ( + 5M,,)-^M,,X, \ / -(1 + 5Mee)-^Me<,y, \ ^2 29) 



which we use to write 



5^F, = --{Y\5QXi + x\5QYi) . (2.30) 
Co 
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The variation of the two pseudo-fermion actions can be expressed in terms of the 
variation of the operator in the same way as for the standard version with 
one pseudo-fermion field. 

As integration scheme we use an improved Sexton- Weingarten [12] integrator 
with different time scales for the gauge part and for the pseudo-fermion part of the 
force which governs the evolution of the momenta. Our integration scheme is the 
same as the one used in [13] and partially removes the step-size errors 0((5t)^) 
in one integration step. 

4. Metropolis accept/reject step: the new configuration (tt', U'} is accepted with 
probability 

min{l,exp(-A//)} , (2.31) 

where 

A// = //(tt', U\ 4>i) - H{7r, U, 4>i) . (2.32) 

We monitor exp(— Ai?) to check for the correctness of the algorithm, in particular 
this quantity has proved to be sensitive to the required numerical reversibility in 
the integration of the equations of motion. For Metropolis-like algorithms it is 
possible to prove the general property 

(exp(-Ai/)) = 1. (2.33) 

We emphasize that this expectation value is sensitive to all proposed field config- 
urations not only to the accepted ones. 



3 Numerical results 

Wc present a numerical study of simulations of two degenerate masslcss quarks 
in the 0(a) on-shell improved Schrodinger functional using the algorithm pre- 
sented in the previous section. For the lattice temporal and spatial extensions, 
the background gauge field and the parameter 9 controlling the spatial boundary 
conditions of the fermion fields we set respectively 

T = L, C = C" = 0, ^ = 0.5. (3.1) 

(see [5] for more detailed definitions of these quantities). This choice is motivated 
by the fact that our algorithmic study is part of large scale simulations performed 
by the ALPHA collaboration to determine the running of the renormalized quark 
mass in two flavor QCD [4] . The massless theory is defined in the bare parameter 
space along the fine k — Hc{P) where the PCAC mass 

^ 1(^0 + do)fA{xo) + CAad;dofp(xo) 

2/p(^o) xo=T/2 
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vanishes^. Here, do and Oq are the forward and backward lattice derivatives, 
respectively, and ca denotes the coefficient multiplying the 0(a) improvement 
term in the improved axial current. For the definitions of the correlation functions 
/a and /p we follow [14]. We stress the fact that simulations along the massless 
line are possible since the Schrodinger functional has a natural infrared cut-off 
proportional to in the spectrum of the Dirac operator squared [8]. If the 

fluctuations of the gauge fleld are not too large this cut-off avoids the occurrence 
of small eigenvalues and thus makes simulations of the massless theory possible 
in not too large physical volumes. 

In the Schrodinger functional the renormalized coupling u — g^{L) is pre- 
sumably a monotonically growing function of L [15-17]. Keeping the physical size 
L constant is equivalent to holding the renormalized coupling fixed. We perform 
simulations approximately at couplings 

ii=1.0, 1.1, 1.2, 1.3, 1.5, 1.8, 2.0, 2.5, 3.3, 5.7, (3.3) 

and for each coupling at various lattice sizes L/a corresponding to different lattice 
resolutions. A summary of the bare parameters and the formulae used for the 0(a) 
improvement coefficients as functions of the bare gauge coupling can be found in 
Appendix A. Moreover we did some exploratory simulations at (5 — 5.2 and 
K = 0.1355 with lattice sizes L/a=6,8 and 12. The corresponding renormalized 
coupling is presently only known for L/a = 6 to he u = 4.7. The value k = 0.1355 
is the same as for the lightest quark mass used in the large volume simulations 
of Refs. [18, 19]. As long as a low energy reference scale is not determined, we 
cannot yet associate the u values with physical units of the box size L. We use 
the renormalized couplings to "label" our simulations. Indicatively the couplings 
in eq. (3.3) correspond to the range 10~^ fm . . . 1 fm. 

In the simulations we report on here the quantity we are primarily interested 
in is the renormalization constant of the pseudoscalar density Zp. For a definition 
of Zp see [14]. In the massless renormalization scheme that we employ, Zp depends 
on the size of the system and on the lattice resolution. We are interested in Zp 
since its running with the renormalization scale /i — 1/L determines the running 
of the renormalized quark mass m{ii): 

m{fi) _ ^.^ Zp{go,2L/a) 



m(/x/2) a/L^o Zp{go,L/a) 



(3.4) 

g^iL)=u 

^Note that many choices of matrix elements of the PCAC relation yield masses which agree 
up to O(a^) effects. The mass definitions can differ for instance in the choice of the parameters 
eq. (3.1), of xo eq. (3.2) or of the boundary states. 
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As a measure of the CPU cost of our simulations we take the integrated autocor- 
relation time of Zy> per lattice point 

Tint(^p) [sec] = Tint(^p) [MD] X ^md X (|)' . (3.5) 

We denote by rint(-^p) [MD] the integrated autocorrelation time [20] in units of 
molecular dynamics (MD) time. In eq. (3.5) ^md are the CPU seconds needed 
on one APEmille crate for one update move of one unit of MD time. Besides 
being machine-dependent Imd depends on the algorithm and on the step-size 5t. 
One APEmille crate consists of 128 nodes whose peak performance adds up to 
68 GFlops. A trivial volume factor is divided out in eq. (3.5). One call of the 
operator Q on APEmille costs roughly 12 ^ sec per lattice point (depending on 
communication). By dividing the value of rint(-^p) [sec] by 12 /i sec we get the 
integrated autocorrelation time expressed in the equivalent number of applications 
of Q. 

3.1 Numerical precision 

We carry out the numerical simulations in single precision arithmetics^. In this 
section we study effects of the numerical precision of our simulations using n^^ — 2 
pseudo-fermion fields by changing some of the algorithmic parameters. 

During the integration of the molecular dynamics equations of motion the 
inversions of the operators + and required in eq. (2.27) are performed 
with the conjugate gradient (CG) iteration algorithm. We use as a stopping 
criterion the relative residue which is defined as the square of the ratio between 
the norm of the residue vector and the norm of the present solution vector. The 
iteration stops whenever this number reaches the requested accuracy e^. The 
starting vector for each inversion along the trajectory is chosen to be zero. We 
first investigated the possibility of increasing the relative residue compared to the 
single precision value e'^ = 10^^'\ thereby saving CPU time. We emphasize that 
detailed balance holds for all values of e"^. In Table 1 we compare the results 
for the choice £^ = 10~^°. We observe a 2a deviation in Zp and an almost 5a 
deviation of (exp(— Aif)) from 1. 

We have recently learned that our definition of the relative residue £^ un- 
fortunately differs from the one commonly used in the numerical mathematics 
literature. There e"^ is defined as the square of the ratio between the norm of 
the residue vector and the norm of the source vector, which is independent of 
the normahzation of the operator inverted on the source vector. We checked the 

^Global sums are performed in double precision. 
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r £2 


ri„t(^p) [MD] 


plaq. 


am 


(exp(-Ai/)) 


1 10-^3 0.6236(17) 


22(8) 


0.68756(1) 


-0.00122(3) 


0.989(7) 


1 10-^° 0.6288(22) 


32(12) 


0.68756(1) 


-0.00118(3) 


1.038(8) 



Table 1: Comparison of simulation results for Upf = 2 with the relative residue 
set to the full single precision value e"^ = 10~^^ and to e"^ = 10"^*^. Results from 
L/a = 24 lattices at coupling 2.5. 



difference between these two definitions of £^ in the simulations of Table 1. When 
normalized using the solution vector reaches the requested precision 10~^^ or 
lO"^*^, the values of £^ normalized using the source vector are about 10 times 
larger. To reach = 10~^^ defined with the latter normalization it requires an 

increase of 25% in CG iterations. 

Relaxing the relative residue in the CG inversion leads to a systematic viola- 
tion of eq. (2.33). Since this possibly indicates the violation of reversibility in the 
integration of the equations of motion we investigate this issue in more detail in 
the next section. 

3.2 Reversibility 

The stability of the integration of molecular dynamics equations of motion in 
connection with Hybrid Monte Carlo algorithms for lattice QCD has been inves- 
tigated in the literature [21-23]. We report here on our experiences looking at the 
reversibility of the integration of the equations of motion with npf = 2 pseudo- 
fermion fields. We compare how quantities which we monitor to check reversibility 
behave when the lattice spacing or the trajectory length are changed. 

3.2.1 Diagnostics 

We perform reversibility tests by integrating over one trajectory of length r "for- 
ward" in molecular dynamics time, reversing at the end of the trajectory the 
signs of the momenta and integrating "backward" in time over one trajectory of 
the same length r. We obtain a cycle: 

U} ^ (vr', U'} ^ {-tt', U'} ^ {tt", U"} (3.6) 

Reversibility means that the final configuration {tt", f/"} is equal to the starting 
one {tt, U}. We quantify irreversibility by defining the quantities 

|Ai/(cycle)| = |if(7r, t/, 0,) - ^U", ^7", 0,)| (3.7) 
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^ 0.004 

< 

— 0.002 - 




Figure 1: Reversibility violations as functions of the lattice spacing for npf = 2. 
The renormalized coupling is 2.5. The trajectories used to compute the quantities 
in eq. (3.7) and eq. (3.8) have length r — 1. 



= ^ E \U{x,f,U,-U"{x,f,U,\\ (3.8) 

where c, c' are the color indices. The step-size is chosen as in our production runs 
to keep the acceptance close to 80% with t — 1 and the relative residue is set to 
£2 = 10-13. 

In Fig. 1 we show the irreversibility as a function of the lattice spacing. 
We plot averages of the quantities eq. (3.7) and eq. (3.8) computed on several 
configurations, which are well spaced in the Monte Carlo history of our production 
runs to avoid autocorrelations. The trajectories for these reversibility tests have 
all length r = 1, the step-size ranges from 1/5 on L/a = 6 to 1/16 on L/a — 24 
lattices. The renormalized coupling has the value 2.5, so increasing L/a decreases 
the lattice spacing by the same factor. 

In Fig. 2 we show the irreversibility as a function of the trajectory length r 
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Figure 2: Reversibility violations as functions of the trajectory length on L/a = 24 
lattices for rzpf = 2. The renormalized couplings are 1.1 (triangles), 2.5 (squares) 
and 5.7 (circles). The open symbols for r = 1, slightly displaced for clarity, are 
obtained by switching off the fermions during the MD evolution. 



used to compute the quantities eq. (3.7) and eq. (3.8). Varying r for fixed step- 
size gives us a handle to artificially change the amount of reversibility violations. 
We plot results for three different couplings 1.1 (triangles), 2.5 (squares) and 5.7 
(circles) on L/a = 24 lattices. The step-sizes are 1/12, 1/16 and 1/18 respectively. 
The quantities in eq. (3.7) and eq. (3.8) are averaged over 30 well spaced configu- 
rations taken from our production runs. The relative increase of |Aif(cycle)| and 
1 1 AC/ 1 1 with increasing r is of comparable magnitude. In general the irreversibility 
is slightly larger for larger coupling. 

For r = 1 we also considered the case when the fermions are switched off in the 
integration of the equations of motion. This corresponds to the open symbols in 
Fig. 2, which are slightly displaced horizontally to distinguish them. We conclude 
that the reversibility violations coming from the gauge part of the force which 
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governs the evolution of the momenta are not neghgible with respect to the ones 
coming from the pseudo-fermion part. 

3.2.2 Influence on observables 

To check that we can safely simulate a trajectory of length t — 1 with the relative 
residue in the CG iterations set to £^ = 10~^^, we repeated one simulation on 
L/a — 24 lattices at coupling 1.1 with trajectory length r = 0.5, keeping the 
other parameters unchanged. As it is shown in Fig. 2 the reversibility violations 
for r = 0.5 are smaller than for r = 1 and so we consider simulations with r = 0.5 
as a safe reference for the correct mean values of observables we are interested in. 

The results are shown in Table 2 and confirm our expectation. In addition 
we do not see a difference in efficiency between r = 0.5 and t — 1 within our 
errors on Tint(-^p)- 



r 




T^Zp) [MD] 


plaq. 


am 


(exp(-Ai/)) 


1 


10-13 0.7828(13) 


26(8) 


0.78876(1) 


-0.00076(2) 


1.012(10) 


0.5 


10-13 0.7813(13) 


24(8) 


0.78876(1) 


-0.00074(1) 


0.998(7) 



Table 2: Comparison of simulation results for Upf = 2 using trajectory length 
T — 1 and r = 0.5. Results stem from L/a — 24 lattices at couphng 1.1. 



From these results and from the ones in Section 3.1 we conclude that it is 
safe for our purposes to simulate with the relative residue set to £^ = lO^^^ in the 
CG inversions and with trajectory length t — 1. 

3.3 Performance 

In this section we compare the performance of our simulations using ripf = 2 
pseudo-fermion fields with simulations using n^i — 1. 





6t 


# traj. 


p 

acc 


plaq. 


(exp(-Ai/)) 


2 


1/18 


2744 


83% 


0.62725(1) 


1.020(9) 


1 


1/36 


160 


49% 


0.62726(5) 


0.81(12) 


1 


1/40 


168 


78% 


0.62731(5) 


0.96(4) 


1 


1/50 


168 


89% 


0.62732(3) 


1.036(27) 



Table 3: Comparison of the step-size between npf = 1 and ripf = 2. Results on 
L/a = 24 lattices at coupling 5.7. 
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Figure 3: Scaling of the average condition number (A;) of with [Tjaf'. Data 
for couphngs 1.0 and 1.1 combined (triangles), 2.5 (squares), 3.3 (pentagons) and 
5.7 (circles). 



First we look at the step-size which is required to get the same acceptance. 
The acceptances in our simulations typically scatter around 80%. Following [24,25] 
we assume that the acceptance probability Pace depends on the step-size br of the 
improved Sexton- Weingarten integration scheme like 

lnPacc~l-PaccCX (5r)^ (3.9) 

This formula takes into account what are expected to be the dominant discretiza- 
tion errors of the integration. We use eq. (3.9) for a small correction to St cor- 
responding to Pace = 80% precisely. Table 3 shows results for simulations on 
L/a = 2A lattices at coupling 5.7. The acceptance 80% is achieved with 5t = 1/17 
for ripf = 2 and 6t = 1/41 for ripf = 1. This is more than a factor two difference. 

In Fig. 3 we plot our available data for the average condition number (k) of the 
operator as function of (T/a)"^. Data are for couplings 1.0 and 1.1 combined 
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Figure 4: Scaling of the inverse step-size 1/6t for 80% acceptance with the average 
condition number (k) of Q^. Open symbols are for rZpf = 1, filled symbols and 
crosses for rzpf = 2 as explained in the text. 

(triangles), 2.5 (squares), 3.3 (pentagons) and 5.7 (circles). They confirm well 
the relation eq. (2.23) and we use it to extrapolate (k) where we do not have 
measurements of the smallest and highest eigenvalue of Q^. To guide the eye we 
draw the dashed line (k) = 10(T/a)^. 

In Fig. 4 the inverse step-size (i.e. the number of steps in one trajectory with 
r = 1) is plotted against (k) in a doubly logarithmic plot. The step-sizes have 
been corrected according to eq. (3.9) to normalize the acceptance to 0.80. Data 
are for couplings 1.0 and 1.1 combined (triangles), 2.5 (squares), 3.3 (pentagons) 
and 5.7 (circles). Open symbols are for Upf = 1 and filled symbols for Upf = 2. 
In addition we plot data for (3 = 5.2, k = 0.1355, L/a = 6, 8 and 12 runs 
(crosses) with Upf = 2. The points both for rZpf = 1 and npf = 2 fall on almost 
parallel straight lines showing remarkable scaling for the different couplings used. 
A parallel displacement in a doubly logarithmic plot corresponds to a change in 
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Q I I I I I I I I I I I I 

2 4 6 

u 

Figure 5: The integrated autocorrelation time rint(-Z^p) [MD] as function of the 
renormahzed couphng u. Data for L/a = 12 with ripf = 1 (open triangles), 
L/a = 12 with rZpf = 2 (filled triangles) and L/a = 24 with npf = 2 (filled circles) 
are shown together with linear fits and their la error bands. 

the pre-factor of the power law relating 1/St to (k). 

In Fig. 5 we plot the integrated autocorrelation time rint(^p) [MD] as func- 
tion of the renormalized coupling u. A careful analysis of the error of Zp and 
Tint{Zp) has been done using the method described in [20]. Data are for L/a = 12 
(triangles) and L/a = 24 (circles) lattices, open symbols refer to simulations with 
npf = 1 and filled symbols to ripf = 2. In the coupling range simulated there is 
no significant dependence on the coupling. We show in the plot the results with 
error bands of linear fits to the data. Similar fits have been done for the L/a = 16 
lattices not shown in Fig. 5. The fits are used to estimate the error of Zp, so 
their uncertainty influence the error of this error. We do not observe a significant 
difference between ripf = 1 and ripf = 2. This supports the results of [2] and the 
assumption made in [3]. 



15 



o 

9) 
m 



DL 



s 0.01 



0.001 




Figure 6: The integrated autocorrelation time rint(^p) [sec] as function of the 
renormahzed couphng u. Data for lattices L/a = 12 (triangles), L/a = 16 
(squares) and L/a = 24 (circles) are shown, open symbols are for Upf = 1 and 
filled symbols for Upf = 2. These data are obtained from eq. (3.5) taking the linear 
fits for n^tiZp) [MD]. 



To see the difference between ripf = 1 and ripf = 2 we plot in Fig. 6 the 
autocorrelation time rint(-^p) [sec] as function of the renormahzed coupling u. In 
these units rint(-^p) becomes a direct measure of the CPU cost of the algorithms. 
Data are for L/a = 12 (triangles), L/a = 16 (squares) and L/a = 24 (circles) 
lattices, open symbols are for ripf = 1 and filled symbols for npf = 2. The data 
shown are obtained from eq. (3.5) taking the values and errors of T[^t{Zp) [MD], 
which result from the linear fits in u. A systematic difference between npf = 1 and 
ripf = 2 is seen for the L/a = 16 lattices, the smallest couplings show a reduction 
of rint(^p) [sec] by about a factor two for ripf = 2. We did not run with ripf = 1 on 
L/a = 24 lattices because, according to our expectations based on the results for 
the L/a = 16 lattices, this would be demanding too much CPU time. In Fig. 6 
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we see that the value of rint(^p) [sec] for the L/a = 16 simulation with rzpf = 1 at 
the largest coupling 5.7 is within error almost compatible with the value for the 
L/a — 24 simulation with npf = 2 at the same couphng. The gain in CPU cost 
can be larger than a factor two for the L/a = 24 simulations when using ripf = 2 
instead of rzpf = 1, as supported by the results for the step-size in Table 3. 

Based on the step-sizes shown in Fig. 4 and the results for rint(-Z^p) [MD] 
shown in Fig. 5 we would expect a gain of a factor two in CPU cost for Hybrid 
Monte Carlo using n^f = 2 pseudo-fermion fields compared to npf = 1. There is 
a computational overhead for using ripf = 2 instead of npf = 1 pseudo-fermions, 
which comes from the inversion of -\- in addition to the inversion of Q^, 
see eq. (2.27). This overhead compared to the number of CG iterations required 
for the inversion of ranges from 20-30% on L/a = 12 lattices to 10-15% on 
L/a = 24 lattices and diminishes as the coupling u increases. This explains why 
the results for the L/a = 12 lattices shown in Fig. 6 indicate a smaller gain in 
CPU cost using npf = 2 than for larger lattices. 

One should keep in mind that these results have been obtained on small and 
intermediate physical volumes. The situation in larger volumes might be even 
better for ripf = 2, as our results for coupling 5.7 indicate. 

4 Conclusions 

We investigated the Hybrid Monte Carlo algorithm using two pseudo-fermion 
fields to simulate two massless flavors in the Schrodinger functional 0(a) improved 
theory. This study is part of large scale simulations to compute the running of 
the renormalized quark mass with the physical lattice size L from the running of 
the renormalization factor Zp of the pseudoscalar density. 

Our results show a gain in CPU cost of a factor two when using two instead 
of one pseudo-fermions. This gain comes from the larger step-size and is based on 
the results for the integrated autocorrelation time of Zp. Since our lattices have 
L f5i 1 fm at most, it might well be that the gain is larger on larger volumes [26]. 

Our simulations are performed with single precision arithmetics. We tested 
the effects of numerical precision and reversibility of the integration of the molecu- 
lar dynamics equations of motion on observables. We confirm that our simulations 
satisfy the requirements imposed both on numerical precision and reversibility. 
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A Simulation parameters 



L/a 


/3 


K 




L/a 


/3 


K 




6 


9.5000 





1315322 


6 


6.6085 





1352600 


8 


9.7341 





1313050 


8 


6.8217 





1348910 


12 


10.05755 





1310691 


12 


7.09300 





1344320 


6 


8.5000 





1325094 


6 


6.1330 





1361100 


8 


8.7223 





1322907 


8 


6.3229 





1357673 


12 


8.99366 





1319754 


12 


6.63164 





1352270 


6 


7.5000 





1338150 


6 


5.6215 





1366650 


8 


7.7206 





1334970 


8 


5.8097 





1366077 


12 


8.02599 





1330633 


12 


6.11816 





1361387 



Table 4: Summary of simulation parameters. 



In Table 4 we list the lattice sizes and bare parameters of our simulations. These 
are divided in six groups of three sets each. In each group j3 and k have been tuned 
for the lattice sizes L/a = 6,8, 12 to keep the renormalized coupling g'^{L) at an 
approximately fixed vahie u (see Ref. [15] for a precise definition of g'^{L)), in the 
left column u — 1.0, 1.2, 1.5, in the right column u — 2.0, 2.5, 3.3 in the order from 
top to bottom. For each bare parameter set we also simulated the doubled lattice 
sizes 2L/a. The renormalized coupling then takes the values u — 1.1, 1.3, 1.8 and 
u = 2.5, 3.3, 5.7 in the same ordering as before. 

The 0(a) boundary improvement coefficient of the Wilson plaquette gauge 
action Ct is set to the perturbative value [27] 

ct = 1 - 0.050718^^ - 0.030^^ . (A.l) 

For the left part of Table 4 except in the simulations with (3 = 7.7206 and Kc — 
0.1334970 we take the 1-loop expression, all the other simulations use the 2-loop 
expression of Ct. 

As concerns the 0(a) improvement of the Wilson-Dirac operator, the clover 
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coefficient Csw is computed using the non-perturbative fit-formula given in [28] 

1 - 0.454gg - 0.175(?o^ + 0.012ffg + 0.045ffg 
""'^ ~ 1-0.720^0' ■ 

The coefficient Ct is set to the 1-loop perturbative value [29] 

ct = 1 -0.01795c/^. (A.3) 

Finally we use for the 0(a) improvement coefficient of the axial current ca 
the 1-loop perturbative value [29] 

Ca = -0.007573^^. (A.4) 
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